An Ensemble Information Extraction Approach to the BioCreative CHEMDNER Task

نویسندگان

  • Madian Khabsa
  • C. Lee Giles
چکیده

We report on the Penn State team’s experience in the CHEMDNER chemical entity mention and the chemical document indexing tasks. Our approach devises a probabilistic framework that incorporates an ensemble of multiple information extractors to obtain high accuracy. The probabilistic framework can be configured to optimize for either precision, recall, or F-Measure based on the task requirement. The ensemble of extractors includes off the shelf chemical entity extractors, along with a version of ChemXSeer extractor that was trained and modified specifically for this task. Experiments on the training and development datasets obtain levels of recall as high as 89%, and f-measure of 73%, when optimizing for each measure respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Chemical and Drug Named Entities by Ensemble Learning Using Chemical NER Tools Based on Different Extraction Guidelines

Chemical named-entity recognition (chemical NER) is the task of extracting chemical information and chemical-related entities such as drug names and source materials from text in several domains such as bioinformatics and nanoinformatics. There have been several attempts to construct corpora for handling such chemical-related information based on different corpus-construction guidelines. Even t...

متن کامل

Mining Patents with tmChem, GNormPlus and an Ensemble of Open Systems

The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extraction from patents. This manuscript describes our submissions to the CEMP (chemical named entity recognition) and GPRO (gene and related object identification) subtasks. Our CEMP submission is an ensemble of...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Chemical entity recognition in patents by combining dictionary-based and statistical approaches

We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistic...

متن کامل

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

BACKGROUND Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013